Overview

Dataset statistics

Number of variables13
Number of observations800
Missing cells386
Missing cells (%)3.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory75.9 KiB
Average record size in memory97.2 B

Variable types

NUM9
CAT3
BOOL1

Warnings

Generation is highly correlated with #High correlation
# is highly correlated with GenerationHigh correlation
Type 2 has 386 (48.3%) missing values Missing
# is uniformly distributed Uniform
Name has unique values Unique

Reproduction

Analysis started2020-11-16 00:27:58.497995
Analysis finished2020-11-16 00:28:09.755189
Duration11.26 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

#
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM

Distinct721
Distinct (%)90.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean362.81375
Minimum1
Maximum721
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum1
5-th percentile34.95
Q1184.75
median364.5
Q3539.25
95-th percentile689.05
Maximum721
Range720
Interquartile range (IQR)354.5

Descriptive statistics

Standard deviation208.3437976
Coefficient of variation (CV)0.574244492
Kurtosis-1.165705095
Mean362.81375
Median Absolute Deviation (MAD)177.5
Skewness-0.001122502762
Sum290251
Variance43407.13798
MonotocityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
47960.8%
 
38640.5%
 
71140.5%
 
71040.5%
 
15030.4%
 
630.4%
 
41330.4%
 
64630.4%
 
30320.2%
 
30220.2%
 
Other values (711)76695.8%
 
ValueCountFrequency (%) 
110.1%
 
210.1%
 
320.2%
 
410.1%
 
510.1%
 
ValueCountFrequency (%) 
72110.1%
 
72020.2%
 
71920.2%
 
71810.1%
 
71710.1%
 

Name
Categorical

UNIQUE

Distinct800
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
Excadrill
 
1
Miltank
 
1
Trubbish
 
1
Mismagius
 
1
Mienshao
 
1
Other values (795)
795 
ValueCountFrequency (%) 
Excadrill10.1%
 
Miltank10.1%
 
Trubbish10.1%
 
Mismagius10.1%
 
Mienshao10.1%
 
Slowbro10.1%
 
Doduo10.1%
 
GengarMega Gengar10.1%
 
Muk10.1%
 
WormadamSandy Cloak10.1%
 
Other values (790)79098.8%
 
Frequencies of value counts

Unique

Unique800 ?
Unique (%)100.0%
Histogram of lengths of the category

Length

Max length25
Median length8
Mean length8.84125
Min length3

Type 1
Categorical

Distinct18
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
Water
112 
Normal
98 
Grass
70 
Bug
69 
Psychic
57 
Other values (13)
394 
ValueCountFrequency (%) 
Water11214.0%
 
Normal9812.2%
 
Grass708.8%
 
Bug698.6%
 
Psychic577.1%
 
Fire526.5%
 
Electric445.5%
 
Rock445.5%
 
Dragon324.0%
 
Ghost324.0%
 
Other values (8)19023.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length8
Median length5
Mean length5.26
Min length3

Type 2
Categorical

MISSING

Distinct18
Distinct (%)4.3%
Missing386
Missing (%)48.3%
Memory size6.2 KiB
Flying
97 
Ground
35 
Poison
34 
Psychic
33 
Fighting
26 
Other values (13)
189 
ValueCountFrequency (%) 
Flying9712.1%
 
Ground354.4%
 
Poison344.2%
 
Psychic334.1%
 
Fighting263.2%
 
Grass253.1%
 
Fairy232.9%
 
Steel222.8%
 
Dark202.5%
 
Dragon182.2%
 
Other values (8)8110.1%
 
(Missing)38648.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length8
Median length3
Mean length4.3725
Min length3

Total
Real number (ℝ≥0)

Distinct200
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean435.1025
Minimum180
Maximum780
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum180
5-th percentile250
Q1330
median450
Q3515
95-th percentile630
Maximum780
Range600
Interquartile range (IQR)185

Descriptive statistics

Standard deviation119.9630398
Coefficient of variation (CV)0.2757121362
Kurtosis-0.5074607103
Mean435.1025
Median Absolute Deviation (MAD)85
Skewness0.1525299234
Sum348082
Variance14391.13091
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
600374.6%
 
405263.2%
 
500232.9%
 
580232.9%
 
300192.4%
 
490182.2%
 
525162.0%
 
480151.9%
 
495151.9%
 
330151.9%
 
Other values (190)59374.1%
 
ValueCountFrequency (%) 
18010.1%
 
19010.1%
 
19410.1%
 
19530.4%
 
19810.1%
 
ValueCountFrequency (%) 
78030.4%
 
77020.2%
 
72010.1%
 
70091.1%
 
680131.6%
 

HP
Real number (ℝ≥0)

Distinct94
Distinct (%)11.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.25875
Minimum1
Maximum255
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum1
5-th percentile35.95
Q150
median65
Q380
95-th percentile110
Maximum255
Range254
Interquartile range (IQR)30

Descriptive statistics

Standard deviation25.53466903
Coefficient of variation (CV)0.368685098
Kurtosis7.232078374
Mean69.25875
Median Absolute Deviation (MAD)15
Skewness1.568224376
Sum55407
Variance652.0193226
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
60678.4%
 
50637.9%
 
70577.1%
 
65465.8%
 
75435.4%
 
80435.4%
 
40384.8%
 
45384.8%
 
55374.6%
 
100324.0%
 
Other values (84)33642.0%
 
ValueCountFrequency (%) 
110.1%
 
1010.1%
 
2060.8%
 
2520.2%
 
2810.1%
 
ValueCountFrequency (%) 
25510.1%
 
25010.1%
 
19010.1%
 
17010.1%
 
16510.1%
 

Attack
Real number (ℝ≥0)

Distinct111
Distinct (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.00125
Minimum5
Maximum190
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum5
5-th percentile30
Q155
median75
Q3100
95-th percentile136.2
Maximum190
Range185
Interquartile range (IQR)45

Descriptive statistics

Standard deviation32.45736587
Coefficient of variation (CV)0.4108462318
Kurtosis0.1697173149
Mean79.00125
Median Absolute Deviation (MAD)20
Skewness0.551613748
Sum63201
Variance1053.480599
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100405.0%
 
65394.9%
 
80374.6%
 
50374.6%
 
85334.1%
 
60334.1%
 
75324.0%
 
70313.9%
 
90303.8%
 
55303.8%
 
Other values (101)45857.2%
 
ValueCountFrequency (%) 
520.2%
 
1030.4%
 
1510.1%
 
2081.0%
 
2210.1%
 
ValueCountFrequency (%) 
19010.1%
 
18510.1%
 
18030.4%
 
17020.2%
 
16530.4%
 

Defense
Real number (ℝ≥0)

Distinct103
Distinct (%)12.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.8425
Minimum5
Maximum230
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum5
5-th percentile35
Q150
median70
Q390
95-th percentile130
Maximum230
Range225
Interquartile range (IQR)40

Descriptive statistics

Standard deviation31.18350056
Coefficient of variation (CV)0.422297465
Kurtosis2.72626036
Mean73.8425
Median Absolute Deviation (MAD)20
Skewness1.155912303
Sum59074
Variance972.4107071
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
70546.8%
 
50496.1%
 
60465.8%
 
80394.9%
 
40364.5%
 
65364.5%
 
90354.4%
 
100334.1%
 
55324.0%
 
45324.0%
 
Other values (93)40851.0%
 
ValueCountFrequency (%) 
520.2%
 
1010.1%
 
1540.5%
 
2040.5%
 
2310.1%
 
ValueCountFrequency (%) 
23030.4%
 
20020.2%
 
18410.1%
 
18030.4%
 
16810.1%
 

Sp. Atk
Real number (ℝ≥0)

Distinct105
Distinct (%)13.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.82
Minimum10
Maximum194
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum10
5-th percentile30
Q149.75
median65
Q395
95-th percentile131.05
Maximum194
Range184
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation32.72229417
Coefficient of variation (CV)0.4493586126
Kurtosis0.2978936607
Mean72.82
Median Absolute Deviation (MAD)20
Skewness0.7446624978
Sum58256
Variance1070.748536
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
60516.4%
 
40496.1%
 
65445.5%
 
50394.9%
 
55354.4%
 
45334.1%
 
70303.8%
 
35293.6%
 
85273.4%
 
80273.4%
 
Other values (95)43654.5%
 
ValueCountFrequency (%) 
1030.4%
 
1540.5%
 
2081.0%
 
2310.1%
 
2420.2%
 
ValueCountFrequency (%) 
19410.1%
 
18030.4%
 
17510.1%
 
17030.4%
 
16520.2%
 

Sp. Def
Real number (ℝ≥0)

Distinct92
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.9025
Minimum20
Maximum230
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum20
5-th percentile32.95
Q150
median70
Q390
95-th percentile120
Maximum230
Range210
Interquartile range (IQR)40

Descriptive statistics

Standard deviation27.8289158
Coefficient of variation (CV)0.3870368318
Kurtosis1.628394057
Mean71.9025
Median Absolute Deviation (MAD)20
Skewness0.8540186115
Sum57522
Variance774.4485544
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
80526.5%
 
50506.2%
 
55475.9%
 
65445.5%
 
60435.4%
 
75405.0%
 
70405.0%
 
90364.5%
 
45354.4%
 
85303.8%
 
Other values (82)38347.9%
 
ValueCountFrequency (%) 
2060.8%
 
2310.1%
 
25111.4%
 
30202.5%
 
3110.1%
 
ValueCountFrequency (%) 
23010.1%
 
20010.1%
 
16020.2%
 
15430.4%
 
15070.9%
 

Speed
Real number (ℝ≥0)

Distinct108
Distinct (%)13.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.2775
Minimum5
Maximum180
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum5
5-th percentile25
Q145
median65
Q390
95-th percentile115
Maximum180
Range175
Interquartile range (IQR)45

Descriptive statistics

Standard deviation29.06047372
Coefficient of variation (CV)0.4256229903
Kurtosis-0.2364366728
Mean68.2775
Median Absolute Deviation (MAD)21
Skewness0.3579332951
Sum54622
Variance844.5111327
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
50465.8%
 
60445.5%
 
70374.6%
 
65364.5%
 
30354.4%
 
80334.1%
 
40324.0%
 
90313.9%
 
100313.9%
 
55303.8%
 
Other values (98)44555.6%
 
ValueCountFrequency (%) 
520.2%
 
1030.4%
 
1591.1%
 
20151.9%
 
2210.1%
 
ValueCountFrequency (%) 
18010.1%
 
16010.1%
 
15040.5%
 
14530.4%
 
14020.2%
 

Generation
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.32375
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.6612904
Coefficient of variation (CV)0.4998241145
Kurtosis-1.239575758
Mean3.32375
Median Absolute Deviation (MAD)2
Skewness0.01425810028
Sum2659
Variance2.759885795
MonotocityIncreasing
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
116620.8%
 
516520.6%
 
316020.0%
 
412115.1%
 
210613.2%
 
68210.2%
 
ValueCountFrequency (%) 
116620.8%
 
210613.2%
 
316020.0%
 
412115.1%
 
516520.6%
 
ValueCountFrequency (%) 
68210.2%
 
516520.6%
 
412115.1%
 
316020.0%
 
210613.2%
 

Legendary
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size800.0 B
False
735 
True
 
65
ValueCountFrequency (%) 
False73591.9%
 
True658.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76CharizardMega Charizard XFireDragon63478130111130851001False
86CharizardMega Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False

Last rows

#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
790714NoibatFlyingDragon2454030354540556False
791715NoivernFlyingDragon53585708097801236False
792716XerneasFairyNaN6801261319513198996True
793717YveltalDarkFlying6801261319513198996True
794718Zygarde50% FormeDragonGround6001081001218195956True
795719DiancieRockFairy60050100150100150506True
796719DiancieMega DiancieRockFairy700501601101601101106True
797720HoopaHoopa ConfinedPsychicGhost6008011060150130706True
798720HoopaHoopa UnboundPsychicDark6808016060170130806True
799721VolcanionFireWater6008011012013090706True